Back to content page

Learning objective

  • Understand the rationale and assumptions of regression discontinuity design
  • Recognize situations where regression discontinuity design can be used to estimate causal effect
  • Estimate and interpret the treatment effect under regression discontinuity design

1 Regression discontinuity design

Concept

Concept

  • Draw causal estimate from observational study based on some assumption => Non randomized/quasi-experimental/pre-post test design
    • Widely used in different fields: economics, politics, medicine, epidemiology and education
    • Example: program evaluation (before and after), treatment assignment (treatment and control) (Any difference around the cut-off can be attributed to treatment coz ppl around cut-off are similar)

Assumptions

  • Known and universal principle for treatment assignment, e.g. start ART when CD4 count <= 200 cells/mm3
  • Assignment criteria shd be measured prior to treatment assignment, e.g.  measure CD4 count before assigning ART
  • There is an interval/assignment/forcing variable in which a cut-off/ discontinuity defines the treatment assignment, e.g. 200 is the cut-off
    • Deterministic: Sharp RD (assume perfect compliance towards assignment)
    • Probabilistic: Fuzzy RD
  • Many units fall on either side of the cut-off
  • Continuity in potential outcome at the cut-off, e.g. mortality just <200 and just >200 should not have a large difference
Possible forms of discontinuity

Possible forms of discontinuity

Example 1

Example 2

LR Model

Linear regression model

where Zi is the assignment variable; c is the cut-off

where Zi is the assignment variable; c is the cut-off

Global/local approach

Estimation strategy

  1. Parametric/global approach
    • Use all observations to help define the global function for estimating effect at cut-off
    • Different functional forms (e.g. linear, quadratic, cubic, and interactions with treatment) can be used to minimize bias
    • Include polynomials for curvilinear relationship
  2. Nonparametric/local approach
    • Limit analysis to observations near the cut-off (‘within the bandwidth’)
    • In the local region, linear relation can be a good approximation. OR With adequate data, just compare means around cut-off (model free)
    • Local regression for curvilinear relationship

Local linear regression/Kernel smoothing

  • A technique to estimate a smoothing function. Non-parametric method coz don’t fit any model to the data.
  • Use local observations to fit linear regression. (Fit a regression model for each observation) Can be extended as local polynomial regression.
  • At each observation x0, minimize \(\sum_i K_h(x_0,x_i)(y_i-\alpha-\beta x_i)^2\) where K_h is a kernel (weighting) function with bandwidth h
    1. Kernel function controls weighting
    2. Bandwidth controls how many data points are included
      • Wider bandwidth = smoother plot, less accurate (more bias)
      • Choice of bandwidth is more important than choice of kernel functions (coz different kernel functions share the same characteristics/concept)
      • Use ‘plug-in’ (theoretically optimized) or cross-validation method to choose bandwidth (implemented in r)

Different Kernel function

Different Kernel function

Bandwidth = 2

Bandwidth = 2

2 RDD Example

Data info

  • To test the effect of minimum legal drinking age (MLDA) on alcohol-related suicides (per 1,000 hospital episodes) based on administrative records on inpatients and from emergency department
  • In this case, age is the assignment variable (cut-off for different ppl) => assign ppl to 2 groups
alc <- read.csv("Data/examplealcohol.csv")
alc$MLDA <- 1*(alc$month>=0) #assign ppl to 2 groups

Global linear

Suppose we hypothesize that there could be intercept and slope change due to MLDA. Fit a model for regression discontinuity design (global approach): \(suicide = \alpha + \beta_1MLDA_i + \beta_2month_i + \beta_3MLDA_i \times month_i + \varepsilon_i\)

## Fit a global model
rdl <- lm(alc.suicide ~ MLDA + month + MLDA*month, data=alc)
summary(rdl)
# There was a 0.14 (95% CI 0.04 0.24) excess alcohol-related suicide per
# 1000 hospital episodes among youths just older than MLDA

## The is no significant change in slope and the interaction term can be dropped
rd1b <- lm(alc.suicide ~ MLDA + month, data=alc)
summary(rd1b)
# There was a 0.14 (95% CI 0.04 0.24) excess alcohol-related suicide per
# 1000 hospital episodes among youths just older than MLDA
## 
## Call:
## lm(formula = alc.suicide ~ MLDA + month + MLDA * month, data = alc)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.287538 -0.066763  0.000769  0.062755  0.300138 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.3228889  0.0366951   8.799 6.92e-13 ***
## MLDA         0.1427157  0.0505295   2.824  0.00619 ** 
## month       -0.0007748  0.0017295  -0.448  0.65557    
## MLDA:month  -0.0011432  0.0023971  -0.477  0.63494    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1078 on 69 degrees of freedom
## Multiple R-squared:  0.1822, Adjusted R-squared:  0.1466 
## F-statistic: 5.123 on 3 and 69 DF,  p-value: 0.002942
## 
## 
## Call:
## lm(formula = alc.suicide ~ MLDA + month, data = alc)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.280099 -0.071606  0.004261  0.064833  0.305791 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.311880   0.028366  10.995  < 2e-16 ***
## MLDA         0.143859   0.050193   2.866  0.00548 ** 
## month       -0.001370   0.001191  -1.150  0.25395    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1072 on 70 degrees of freedom
## Multiple R-squared:  0.1795, Adjusted R-squared:  0.156 
## F-statistic: 7.655 on 2 and 70 DF,  p-value: 0.000985

Plot of data; Note the discontinuity in intercept

Plot of data; Note the discontinuity in intercept

Global polynomial

Non-linear/curvilinear relationship might explain the gap => have polynomial like 2nd or 3rd degree

  • I(): treat variable ‘as is’, to differentiate from formula operators (used to stop operator characters (^,+,*, etc) from being interpreted as they are in a formula). For example plot(y~x^3) vs plot(y~I(x^3))
  • poly(): compute orthogonal polynomials to avoid multicollinearity
rd1c <- lm(alc.suicide ~ MLDA + month + I(month^2) + I(month^3), data=alc)
#OR
rd1c <- lm(alc.suicide ~ MLDA + poly(month, degree=3), data=alc)
# The estimate changed slightly from 0.14 to 0.12 excess alcohol-related suicide per 1000 # hospital episodes among youths just older than MLDA (though less precise) (no longer 
# significant)

Plot of data

Plot of data

Local (np)

Local linear/polynomial regression

Package: np
Usually involve two steps: (1) computing the bandwidth; (2) fitting the local polynomial regression model

  • npregbw(y ~ x, bws, ckertype, regtype, bandwidth.compute=T, data)
    • x …explanatory variables
    • bws …pre-determined bandwidth
    • ckertype…continuous kernel type, e.g. ‘gaussian’, ‘epanechnikov’, ‘uniform’
    • regtype = ll generates a local linear estimator, ‘lc’ for local constant estimator
    • bandwidth.compute = TRUE to search for suitable bandwidths
  • npreg(bws)
    • bws is the bandwidth specification from npregbw()
require(np)
data <- read.csv ("Data/digoxin.csv")
plot(data$age, data$bmi, type="p", xlab="Age", ylab="BMI", las=1)

## Local linear regression
bmi.bw <- npregbw(bmi~age, bandwidth.compute=TRUE,ckertype="gaussian", regtype="ll", data=data)
bmi.lp <- npreg(bws = bmi.bw)
## Visualize regression
with(data, plot(age[order(age)], predict(bmi.lp)[order(age)], type='l', las=1, 
                xlab="Age", ylab="BMI", xlim=c(30,90), ylim=c(15,45)))
with(data, points(age, bmi, pch=19, col="red"))

## 
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |
                   

Bandwidth importance

Bandwidth importance

Local (rdrobust)

Local linear/polynomial regression

Packages: rdrobust

  • rdrobust(y, x, c = 0, p = 1, h = NULL, kernel = "tri", bwselect = "mserd", all = FALSE)
    • y, x …outcome and assignment variables
    • c …cut-off in x
    • p …order of the local polynomial, p = 1 refers to local linear regression
    • h …bandwidth. If not specified, the bandwidth will be selected by bwselect
    • kernel…kernel function, e.g. ‘triangular, ’epanechnikov’, ‘uniform’
    • bwselect…algorithm to select bandwidth, e.g. ‘mserd’ minimizes the mean squared error (MSE)
    • all = TRUE to show conventional and bias-corrected / robust estimates
library(rdrobust)
rd.l <- rdrobust(alc$alc.suicide, alc$month, all=T)
rd.l #not much useful info
## Call: rdrobust
## 
## Number of Obs.                   73
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                   36           37
## Eff. Number of Obs.               9           10
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   9.240        9.240
## BW bias (b)                  14.336       14.336
## rho (h/b)                     0.645        0.645
## Unique Obs.                      36           37
summary(rd.l)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                   73
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                   36           37
## Eff. Number of Obs.               9           10
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   9.240        9.240
## BW bias (b)                  14.336       14.336
## rho (h/b)                     0.645        0.645
## Unique Obs.                      36           37
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.098     0.156     0.632     0.528    [-0.207 , 0.403]     
## Bias-Corrected     0.078     0.156     0.501     0.616    [-0.227 , 0.383]     
##         Robust     0.078     0.194     0.402     0.687    [-0.302 , 0.458]     
## =============================================================================
# There was an estimated 0.08 (95% CI -0.30 0.46) excess alcohol-related 
# suicide per 1000 hospital episodes among youths just older than MLDA
# Bias correction is needed due to the bandwidth selection. 
# Note the wide 95% CI
# Selected bandwidth was about h = 9 which significantly reduced the sample size

Global (rdrobust)

## Set h widest and kenrel as uniform to use global approach
rd.g <- rdrobust(alc$alc.suicide, alc$month, h=100, kernel='uniform', all=T)
summary(rd.g)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                   73
## BW type                      Manual
## Kernel                      Uniform
## VCE method                       NN
## 
## Number of Obs.                   36           37
## Eff. Number of Obs.              36           37
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                 100.000      100.000
## BW bias (b)                 100.000      100.000
## rho (h/b)                     1.000        1.000
## Unique Obs.                      36           37
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.143     0.063     2.248     0.025     [0.018 , 0.267]     
## Bias-Corrected     0.127     0.063     1.997     0.046     [0.002 , 0.251]     
##         Robust     0.127     0.101     1.250     0.211    [-0.072 , 0.326]     
## =============================================================================
# The estimated effect is 0.14 (95% CI = 0.02 0.27), the bias is caused by the bandwidth
# selection, therefore conventional approach could be used when the bandwidth is pre-defined.

3 Assessing RDD assumptions

1 Assignment

Known and universal principle for treatment assignment

  • Plot assignment/forcing variable against treatment status

2 Assignment

Universal principle for treatment assignment. Make sure no manipulation of treatment status (no ‘bunching’)

  • Plot distribution of assignment/forcing variable
  • Plot distribution of other covariates across the cut-off

3 Continuity

Continuity in potential outcomes at the cut-off.

  • Balance (continuity) of covariate. Subject characteristics similar on each side of the cut-off.
  • Cut-off should not be determined by discontinuity in the relation between Z and Y, e.g. cut-off to match the initiation timing of a national wide psychological health program.

4 Model choice

Visual inspection will inform the model choice

  • Plot the outcome against assignment variable
  • An obvious ’jumpis expected if there is significant treatment effect

Correct specification of model - Sensitivity analysis by allowing curvilinear relation/local regression (allow for non-linear relationship by polynomials + global/local approach)

Sidenote

Fuzzy regression discontinuity

Some subjects may not receive the assigned treatment. So treatment was partially defined by the assignment variable. Treatment effect can be estimated by two-stage least square (2SLS) method:

The assignment variable as instrumental variable. Can be fitted using the “fuzzy” option in rdrobust.

Strength and limitation

  • Strengths
    • Required relatively weak assumptions compared to other quasiexperimental design (to find out causal relationship)
    • Provide strong evidence of causal effect if assumptions are satisfied
    • Key assumptions can be supported by the study data (can be tested)
  • Limitations
    • Need correctly specified model to obtain unbiased estimate (linear, quadratic etc)
    • Need larger sample size than RCT (esp local approach)
    • Curvilinearity may explain discontinuity if the model is misspecified
    • Results generalizable around the cut-off but not globally unless strong assumptions are made

Real-life example

Subject characteristic

Assignment

Result